AITopics | Segovia

Collaborating Authors

Segovia

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Cao, Yixin, Hong, Shibo, Li, Xinze, Ying, Jiahao, Ma, Yubo, Liang, Haiyuan, Liu, Yantao, Yao, Zijun, Wang, Xiaozhi, Huang, Dan, Zhang, Wenxuan, Huang, Lifu, Chen, Muhao, Hou, Lei, Sun, Qianru, Ma, Xingjun, Wu, Zuxuan, Kan, Min-Yen, Lo, David, Zhang, Qi, Ji, Heng, Jiang, Jing, Li, Juanzi, Sun, Aixin, Huang, Xuanjing, Chua, Tat-Seng, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceApr-29-2025

Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around core competencies such as knowledge, reasoning, instruction following, multi-modal understanding, and safety; and (ii) from manual to automated evaluation, encompassing dynamic dataset curation and "LLM-as-a-judge" scoring. Yet, even with these transitions, a crucial obstacle persists: the evaluation generalization issue. Bounded test sets cannot scale alongside models whose abilities grow seemingly without limit. We will dissect this issue, along with the core challenges of the above two transitions, from the perspectives of methods, datasets, evaluators, and metrics. Due to the fast evolving of this field, we will maintain a living GitHub repository (links are in each section) to crowd-source updates and corrections, and warmly invite contributors and collaborators.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.18838

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(36 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology (1.00)
Energy (1.00)
(4 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TempoNet: Empowering long-term Knee Joint Angle Prediction with Dynamic Temporal Attention in Exoskeleton Control

Saoud, Lyes Saad, Hussain, Irfan

arXiv.org Artificial IntelligenceOct-3-2023

In the realm of exoskeleton control, achieving precise control poses challenges due to the mechanical delay of exoskeletons. To address this, incorporating future gait trajectories as feed-forward input has been proposed. However, existing deep learning models for gait prediction mainly focus on short-term predictions, leaving the long-term performance of these models relatively unexplored. In this study, we present TempoNet, a novel model specifically designed for precise knee joint angle prediction. By harnessing dynamic temporal attention within the Transformer-based architecture, TempoNet surpasses existing models in forecasting knee joint angles over extended time horizons. Notably, our model achieves a remarkable reduction of 10\% to 185\% in Mean Absolute Error (MAE) for 100 ms ahead forecasting compared to other transformer-based models, demonstrating its effectiveness. Furthermore, TempoNet exhibits further reliability and superiority over the baseline Transformer model, outperforming it by 14\% in MAE for the 200 ms prediction horizon. These findings highlight the efficacy of TempoNet in accurately predicting knee joint angles and emphasize the importance of incorporating dynamic temporal attention. TempoNet's capability to enhance knee joint angle prediction accuracy opens up possibilities for precise control, improved rehabilitation outcomes, advanced sports performance analysis, and deeper insights into biomechanical research. Code implementation for the TempoNet model can be found in the GitHub repository: https://github.com/LyesSaadSaoud/TempoNet.

artificial intelligence, deep learning, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2310.01795

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States (0.04)
Europe > Spain > Castile and León > Segovia Province > Segovia (0.04)

Genre:

Research Report > Promising Solution (0.88)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Knee Joint Angle Prediction through Dynamic Contextual Focus and Gated Linear Units

Saoud, Lyes Saad, Ibrahim, Humaid, Aljarah, Ahmad, Hussain, Irfan

arXiv.org Artificial IntelligenceOct-2-2023

Accurate knee joint angle prediction is crucial for biomechanical analysis and rehabilitation. In this study, we introduce FocalGatedNet, a novel deep learning model that incorporates Dynamic Contextual Focus (DCF) Attention and Gated Linear Units (GLU) to enhance feature dependencies and interactions. Our model is evaluated on a large-scale dataset and compared to established models in multi-step gait trajectory prediction. Our results reveal that FocalGatedNet outperforms existing models for long-term prediction lengths (20 ms, 60 ms, 80 ms, and 100 ms), demonstrating significant improvements in Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). Specifically for the case of 80 ms, FocalGatedNet achieves a notable MAE reduction of up to 24\%, RMSE reduction of up to 14\%, and MAPE reduction of up to 36\% when compared to Transformer, highlighting its effectiveness in capturing complex knee joint angle patterns. Moreover, FocalGatedNet maintains a lower computational load than most equivalent deep learning models, making it an efficient choice for real-time biomechanical analysis and rehabilitation applications.

artificial intelligence, focalgatednet, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2306.069

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Spain > Castile and León > Segovia Province > Segovia (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Making drones suitable for cities

RobohubMay-21-2023, 08:30:42 GMT

With technology for drones far advanced, the next step is to ensure they can fly safely in cities. Image credit: CC0 via Unsplash The Spanish resort town of Benidorm is known for its sandy beaches with clear waters, a skyline dominated by towering hotels and tourists from northern Europe. But one day in February, it also served as a testing ground for European society's future with drones. Since the local economy depends on tourism during the summer, Benidorm is relatively empty in winter – and that's a plus when it comes to safety while testing unmanned aerial vehicles (UAVs). The tall buildings that dominate the skyline also stand in nicely for those of a big city. In sum, it's an ideal place to try out new drone technology.

drone, moreno lorente, soley, (14 more...)

Robohub

Country:

Europe > Northern Europe (0.25)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.06)
North America > United States > New York (0.05)
(4 more...)

Industry:

Aerospace & Defense > Aircraft (0.51)
Transportation (0.51)
Information Technology (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Add feedback

How do tuna schools associate to dFADs? A study using echo-sounder buoys to identify global patterns

Navarro-García, Manuel, Precioso, Daniel, Gavira-O'Neill, Kathryn, Torres-Barrán, Alberto, Gordo, David, Gallego, Víctor, Gómez-Ullate, David

arXiv.org Artificial IntelligenceJul-14-2022

As fishermen have noticed this behaviour, they have used both natural and man-made floating objects, or drifting Fish Aggregating Devices (dFADs), as a tool for finding and catching tropical tunas. The use of dFADs in tuna purse-seine fisheries has gradually increased since the 1980s to the present time, where vessels using dFADs now contribute to 36% of the world's total tropical tuna catch (Davies et al., 2014; Wain et al., 2021; ISSF, 2021). These widespread changes have highlighted the need to better understand the potential ecological effects of dFADs on tuna ecology and the marine environment, in order to ensure adequate management of fish stocks and dFAD usage. Indeed, both the dynamics of how and why tuna associate to dFADs are still poorly understood. Regarding the reasons behind tuna aggregation to dFADs, a number of hypotheses have been suggested (Fréon and Dagorn, 2000; Dempster and Taquet, 2004; Castro et al., 2002). Of these, two have gained traction: the "meeting-point" hypothesis, which considers that dFADs facilitate the encounter between individuals or schools, thus constituting larger schools that could benefit survival rates (Castro et al., 2002); and the "indicator-log" hypothesis, by which tunas may be safeguarding the survival of their eggs, larvae and juvenile stages by using drifting objects as indicators of areas where plankton and food is readily available (Hall et al., 1992). This scenario has led some authors to postulate that man-made dFADs could have detrimental effects on tuna populations by creating a so-called "ecological trap" which would lead tuna to remain associated to dFADs even as these drift into areas that could negatively affect the tuna's behaviour and biology (Marsac et al., 2000; Hallier and Gaertner, 2008). To the best of our knowledge, there is yet no sufficient evidence to either confirm or reject this hypothesis (see Dagorn et al. (2012) and references therein). Given the concerns around the widespread use of dFADs in tuna fisheries today, it is not surprising that a considerable amount of research has been devoted to characterizing the dynamics at play when tunas aggregate to dFADs.

biomass, dfad, tuna, (14 more...)

arXiv.org Artificial Intelligence

2207.07049

Country:

Pacific Ocean (0.06)
Indian Ocean (0.06)
Atlantic Ocean (0.05)
(8 more...)

Genre: Research Report > New Finding (0.94)

Industry: Food & Agriculture > Fishing (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback